General-purpose Architectures for Media Processing and Database Applications

نویسندگان

  • Parthasarathy Ranganathan
  • Joseph R. Cavallaro
  • Keith D. Cooper
  • Norman P. Jouppi
  • Willy E. Zwaenepoel
  • Noah Harding
چکیده

Workloads on general-purpose computing systems have changed dramatically over the past few years, with greater emphasis on emerging compute-intensive applications such as media processing and databases. However, until recently, most high performance computing studies have primarily focused on scientific and engineering workloads, potentially leading to designs not suitable for these emerging workloads. This dissertation addresses this limitation. Our key contributions include (i) the first detailed quantitative simulation-based studies of the performance of media processing and database workloads on systems using state-of-the-art processors, and (ii) cost-effective architectural solutions targeted at achieving the higher performance requirements of future systems running these workloads. The first part of the dissertation focuses on media processing workloads. We study the effectiveness of state-of-the-art features (techniques to extract instruction-level parallelism, media instruction-set extensions, software prefetching, and large caches). Our results identify two key trends: (i) media workloads on current general-purpose systems are primarily compute-bound and (ii) current trends towards devoting a large fraction of on-chip transistors (up to 80%) for caches can often be ineffective for media workloads. In response to these trends, we propose and evaluate a new cache organization, called reconfigurable caches. Reconfigurable caches allow the on-chip cache transistors to be dynamically divided into partitions that can be used for other activities (e.g., instruction memoization, application-controlled memory, and prefetching buffers), including optimizations that address the compute bottleneck. Our design of the reconfigurable cache requires relatively few modifications to existing cache structures and has small impact on cache access times. The second part of the dissertation evaluates the performance of database workloads like online transaction processing and decision support system on shared-memory multiprocessor servers with state-of-the-art processors. Our main results show that the key performance-limiting characteristics of online transaction processing workloads are (i) large instruction footprints (leading to instruction cache misses) and (ii) frequent data communication (leading to cache-to-cache misses). We show that both these inefficiencies can be addressed with simple cost-effective optimizations. Additionally, our analysis of optimized memory consistency models with state-of-the-art processors suggest that the choice of the hardware consistency model of the system may not be a dominant factor for database workloads.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stream Processing in General-Purpose Processors

To date stream processing has been applied to a variety of special purpose hardware architectures including stream processors, DSP, and graphics engines. We believe that the stream processing programming paradigm will also be a win for general-purpose processors, for executing both applications that have been identified previously for streaming such as media processing, as well as for wider cla...

متن کامل

Ultra-Low-Energy DSP Processor Design for Many-Core Parallel Applications

Background and Objectives: Digital signal processors are widely used in energy constrained applications in which battery lifetime is a critical concern. Accordingly, designing ultra-low-energy processors is a major concern. In this work and in the first step, we propose a sub-threshold DSP processor. Methods: As our baseline architecture, we use a modified version of an existing ultra-low-power...

متن کامل

A Novel Multiply-Accumulator Unit Bus Encoding Architecture for Image Processing Applications

In the CMOS circuit power dissipation is a major concern for VLSI functional units. With shrinking feature size, increased frequency and power dissipation on the data bus have become the most important factor compared to other parts of the functional units. One of the most important functional units in any processor is the Multiply-Accumulator unit (MAC). The current work focuses on the develop...

متن کامل

An MPEG-4 performance study for non-SIMD, general purpose architectures

MPEG-4 is an important international standard with wide applicability. This paper focuses on MPEG-4’s main profile, video, whose approach allows more efficiency in coding and more flexibility in managing heterogeneous media objects than previous MPEG standards. This study presents evidence to support the assertion that for non-SIMD architectures and computational models, most memory-system opti...

متن کامل

Parallel Search On Video Cards

Recent approaches exploiting the massively parallel architecture of graphics processors (GPUs) to accelerate database operations have achieved intriguing results. While parallel sorting received significant attention, parallel search has not been explored. With p-ary search we present a novel parallel search algorithm for large-scale database index operations that scales with the number of proc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000